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SUMMARY  PAGE 


THE  PROBLEM 

Recent  experimental  studies  have  analyzed  the  time  to  perform  tasks 
patterned  after  standard  tests  of  spatial  ability.  Based  on  these  analyses, 
information-processing  models  have  been  developed  suggesting  that  sub- 
jects work  through  a sequence  of  component  mental  processes  (e.g  . , code, 
transform,  match)  to  perform  spatial  test  items.  If  these  models  are  correct, 
then  response  latencies,  especially  estimates  of  component-process  dura- 
tions, may  be  the  best  measures  of  spatial  ability.  By  contrast,  traditional 
psychometric  analyses  of  these  tasks  have  consistently  used  overall  accuracy 
scores  as  measures  of  spatial  ability. 

A model  of  the  relationship  between  traditional  accuracy  measures  of 
spatial  ability  and  theoretically  based  latency  measures  is  proposed.  In  this 
model  overall  accuracy  and  mean  latency  are  viewed  as  composite  scores  con- 
sisting of  the  product  (accuracy)  or  sum  (latency)  of  component -process  para- 
meters. Three  experiments  investigated  the  relationship  between  spatial 
a-  curacy  and  latency  scores,  and  established  some  psychometric  properties 
(i eliability , correlation  across  tests,  predictive  validity)  of  various  measures. 

FINDINGS 

While  accuracy  and  mean  latency  scores  each  proved  to  be  reliable  and 
consistent  across  different  tests,  the  two  measures  were  virtually  independent. 
Further  analyses  using  component-process  latency  scores  suggest  that  different 
mental  processes  influence  overall  accuracy  and  mean  latency.  One  hypothesis 
consistent  with  the  data  is  that  spatial  accuracy  scores  reflect  the  ability  to 
accurately  code  a pictorial  stimulus,  but  mean  latency  scores  on  the  same  items 
reflect  the  ability  to  mentally  transform  the  code.  Implications  for  ability  testing 
are  discussed . 


Current  address  of  the  author  is:  Dennis  E.  Egan,  Ph.D.,  Bell  Labora- 
tories, 600  Mountain  Avenue,  Murray  Hill,  Now  Jersey  07874. 
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INTRODUCTION 


Recent  experimental  studies  have  analyzed  human  performance  on  tasks 
adapted  from  standard  tests  of  mental  abilities . In  these  new  analyses  tasks  are 
broken  down  into  a sequence  of  component  processes , and  response  time  is  shown 
to  consist  of  the  sum  of  process  latencies . Processing  models  based  on  response 
latency  have  been  developed  for  spatial  visualization  (14)  , letter  series  com- 
pletion (12) , verbal  and  geometric  analogies  (16) , and  mental  arithmetic  (9)  . 

If  examinees  perform  test  items  by  working  through  a sequence  of  pro- 
cesses in  real  time,  then  response  latencies,  especially  estimates  of  component- 
process  latencies,  may  be  the  best  measures  of  the  abilities  being  tested.  In  con- 
trast, traditional  psychometric  analyses  of  these  tasks  have  consistently  used 
overall  accuracy  scores  as  measures  of  subjects'  ability.  Spatial  ability  serves 
as  a case  in  point . 

PSYCHOMETRIC  ANALYSES  OF  SPATIAL  ABILITY 

Until  recently , spatial  ability  had  been  studied  exclusively  by  using  factor 
analyses  of  accuracy  scores  from  batteries  of  paper-and-pencil , multiple-choice 
tests.  Kelly  (11)  and  Thurstone  (18)  were  among  the  first  to  induce  the  existence 
of  a "spatial  factor."  Since  then,  efforts  have  concentrated  on  isolating  two  or 
more  spatial  factors  through  refinements  in  testing  and  statistical  procedures 
(4,  7)  . Guilford's  (6)  identification  of  three  spatial  factors,  cognition  of  visual - 
figural  systems  (CFS-V)  , cognition  of  figural  transformations  (CFT)  , and  cogni- 
tion of  kinesthetic -figural  systems  (CFS-K)  , represents  a current  view  of  the  fac- 
tor structure  of  spatial  ability . Besides  being  valid  predictors  of  success  in 
mechanical  and  technical  training  programs,  tests  loading  on  the  first  two  of 
these  factors  have  also  proved  to  be  valid  predictors  of  pilot  and  navigator  train- 
ing criteria  (7)  . 

INFORMATION-PROCESSING  ANALYSES  OF  SPATIAL  VISUALIZATION 

Performance  on  tests  loading  on  the  CFT  factor  has  been  studied  recently 
by  using  latency  of  response  to  analyze  the  mental  processing  of  individual  items. 
Shepard  and  Metzler  (14)  studied  a task  in  which  pictures  of  two  three-dimen- 
sional block  structures  were  presented  and  subjects  had  to  decide  whether  the 
two  figures  were  the  same  or  different.  The  main  finding  was  that  the  latency  to 
make  a correct  "same"  response  was  linearly  related  to  the  angle  through  which 
one  figure  had  to  be  mentally  rotated  to  bring  it  into  congruity  with  the  other 
figure.  Just  and  Carpenter  (10)  showed  that  eye  movements  and  response 
latencies  suggest  that  subjects  work  through  a sequence  of  three  processes  - 
search,  transformation  and  confirmation  as  they  perform  the  block  rotation 
task.  In  related  tasks  Shepard  and  Feng  (13)  found  a linear  relationship 
between  the  number  of  operations  required  for  mental  paper -folding  items  and 


1 


the  latency  of  response,  and  Cooper  and  Shepard  (2)  extended  the  findings  to  the 
mental  rotation  of  individual  letters. 

A MODEL  OF  ACCURACY  AND  LATENCY  MEASURES  FOR  COGNITIVE  TASKS 

How  might  subjects'  characteristic  accuracy  and  response  latency  on  a cog- 
nitive task  be  related?  Suppose  subjects  work  through  a sequence  of  processes 
to  arrive  at  an  answer  to  a test  item.  Figure  1 depicts  such  a sequence.  For 
spatial  tasks  Process  1 might  be  coding  the  visual  stimulus,  Process  2 might  be 
transforming  the  coded  representation  in  some  way  (e.g  . , rotation) , Process 
might  be  outputting  the  response  by  pushing  a button,  and  so  forth.  While  it  is 
not  the  present  intention  to  test  a component-process  model  for  spatial  tasks, 
models  have  been  developed  elsewhere  (3,  10)  that  follow  this  approach. 

Given  that  subjects  work  through  a sequence  of  processes,  performance  of 
Subject  on  Process  is  characterized  by  two  parameters: 

p = /’  (Subject  i completed  Process  j correctly)  , and 
V 


t..  = Time  taken  for  Subject  i to  complete  Process  j correctly. 


These  parameters  cannot  be  estimated  directly  unless  component  processes  are 
isolated  (16,  17,  19) . However,  the  following  quantities  are  natural  to  con- 
sider because  they  are  easily  estimated  and  reasonably  reliable: 


P.  = n p.. 

*•  i « 


= /'  (Subject  i completes  all  processes  correctly) , and 


2 1..  = Time  taken  for  Subject  i to  complete  all  processes 

J l]  correctly . 


If  the  test  items  are  homogeneous,  then  the  proportion  of  items  a subject  gets  cor- 
rect* is  an  estimate  of  p,  , and  the  mean  time  taken  to  answer  correctly  is  an 


♦Proportion  correct  and  mean  correct  response  latency  are  actually  biased 
estimates  of  P,„  and  because  responses  may  result  from  guesses  or  a process- 
ing sequence  in  which  more  than  one  process  is  incorrect.  For  example,  a 
sequence  of  two  processes  m a two-choice  task  may  produce  a correct  response 
when  both  processes  are  completed  correctly  or  when  both  processes  are  carried 
out  incorrectly.  While  the  bias  will  be  ignored  in  this  discussion,  it  may  be  quite 
important  for  tests  composed  of  very  difficult  items. 
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P (SUBJECT  i TIME  FOR  SUBJECT  I 

COMPLETES  PROCESS  j)  TO  COMPLETE  PROCESS  j 
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Figure  1 . Accuracy  and  latency  parameters  of  a 

component-process  model  and  definitions 
of  composite  accuracy  and  latency  scores. 


estimate  of  t . The  measure  p should  be  closely  related  to  traditional 

»•  t- 

accuracy  scores,  while  /.  is  the  sum  of  theoretical  process  latencies. 

What  relationship  ought  to  exist  between  p.  and  (.  '■  It  should  be  clear 

that  p and  t are  both  composite  scores  and  that  the  character  of  these  com- 
(•  »• 

posites  depends  on  the  variability  and  intercorrelation  of  their  components . 

Thus  the  correlation  of  the  two  composite  scores  may  vary  considerably,  depend- 
ing on  which  processes  influence  them  . 

Let  us  take  the  case  of  a sequence  of  two  component  processes  for  a task. 

Then  for  subject  i the  probability  of  completing  both  processes  correctly  is 
p . = p.  p.^  and  the  time  taken  to  complete  both  processes  correctly  is 
l=t  + t Let  us  further  assume  that  t.  . I .p  and  p have  a 

I-  l|  (2  II  12  II  12 

multivariate  normal  distribution.  This  rather  general  model  has  14  parameters 
for  this  example,  that  is,  4 means  (I..  (2 . p. . ).  4 variances  s2  . s2  , s2  , s2  ,and(A.)=6 

h 1 2 p i p 2 2 

correlations  between  variables.  To  make  the  model  more  tractable,  we  will 
add  three  further  assumptions  about  the  correlations.  First,  we  will  assume 
that  r -•  r =p  . That  is , correlations  between  processes  are  equal 

v'1,,  " 

for  latency  and  accuracy  measures.  Second,  we  will  assume  that  r = 

ii  • Pi  i 

r =pw-  That  is,  correlations  between  latency  and  accuracy  within  pro- 

*2  ' Pt2 

cesses  are  equal  for  both  processes.  Third,  we  will  assume  that  once  pB 

and  o are  known,  r and  r add  no  further  information . 

ln-Ph  'n  'Pi, 

That  is,  the  latency  of  one  process  correlates  with  the  accuracy  of  the  other 
only  to  the  extent  determined  by  pB  and  pw  . 


4 


Given  these  assumptions,  we  have: 


E(t.)  = t +1 

i>  1 1 


Par  (f  ) = s1  + s2  + 2p„s  s 

»•  / 1 / 2 B /I  < 


E(p.)  = p'p,  + p_s  s 

i-  1 * B pi  p2 


..  -27-:;  22  2 

lar(p,)~p  s +p7s  +s  s (l  + p ) + 2p_s  s pp 

»•  1 p2  pi  pi  pi  ■ ‘pipi  i i 


Cov(t.,p.)=p  (p  s (p  s +s  )+ps  fp-S  + .< 

I-  I-  W 1 P 2 B t\  tl  2 pi  B / 2 /l 


Then  ft  = Correlation  between  f.  and  p. 

i-  i* 


Cor  (t, , p. ) 

!•  i* 


^ Far  ft  J Far  (p  ) 

i-  i- 


It  is  evident  from  the  covariance  formula  that  the  underlying  relationship 
of  component  accuracy  and  latency  cannot  be  known  by  simply  observing  the 
correlation  between  t and  p . For  example,  a zero  correlation  of  accuracy 

i ■ i- 

and  latency  within  processes  implies  a zero  correlation  of  composite  (i.e., 
pw  = 0 implies  (R  - o),  but  the  converse  is  not  true.  In  fact  pw  need  not 

have  the  same  sign  as  ft,  depending  on  pB  and  parameter  variances . 


; . < c *.  , . 

»>  ■ ■ . »*.*  -,v  •” 


The  point  of  this  model  is  to  show  that  accuracy  and  latency  measures  of 
component  processes  may  be  correlated  even  when  accuracy  and  latency  com- 
posites are  virtually  independent.  Put  another  way,  component  accuracy  and 
latency  may  be  measuring  the  same  ability  even  when  composite  accuracy  and 
latency  effectively  measure  different  abilities.  This  can  be  due  simply  to  the 
variance  and  inter  cor  relation  of  component-process  parameters  combined  in  com- 
posite scores . In  particular , it  may  be  the  case  that  an  accuracy  composite 
such  as  p.  is  heavily  influenced  by  accuracy  in  one  process  while  a latency  com 

posite  such  as  t.  is  heavily  influenced  by  the  latency  of  a second  process  uncor- 
related or  negatively  correlated  with  the  first. 

OBJECTIVES  OF  PRESENT  STUDIES 

The  present  studies  were  designed  to  obtain  empirically  correlations 
between  accuracy  and  latency  scores  on  spatial  tests.  The  accuracy  scores  were 
estimates  of  p.  • Component-process  accuracy  scores  could  not  be  estimated 

reliably  in  the  short  testing  sessions  used.  The  latency  scores  ranged  from  the 
mean  time  taken  to  answer  items  correctly  (i.e. , estimates  of  t. ) to  component- 

process  latencies  (i.e.,  estimates  of  t '*)■  The  main  purpose  was  to  determine 

the  relationship  between  the  new  latency  measures  of  spatial  ability  and  the 
traditional  accuracy  scores  on  the  same  items.  Additionally,  psychometric  char- 
acteristics of  accuracy  and  latency  scores  (reliability,  correlations  across  dif- 
ferent tests,  validity  for  predicting  spatially  loaded  aviation  training  criteria) 
were  obtained 

Three  studies  were  performed.  In  the  first  two  studies,  subjects  were 
given  standard  group  tests  of  spatial  ability  (as  part  of  an  aptitude  battery)  as 
well  as  tests  using  items  whose  content  was  similar,  but  whose  design  permitted 
the  analysis  of  accuracy  and  latency  of  response  to  individual  items.  Accuracy 
and  mean  latency  scores  were  correlated , and  factors  were  extracted.  In  the 
third  experiment,  two  tests  of  spatial  ability  and  a simple  test  of  nonspatial  judg- 
ments were  administered  using  a test-retest  procedure.  This  permitted  an  analy- 
sis of  reliability  and  partial  correlation  of  several  component-process  latency 
scores  as  well  as  an  additional  assessment  of  the  accuracy-mean  latency  relation- 
ship. As  subjects  in  these  experiments  were  naval  aviation  candidates,  the 
validity  of  latency  and  accuracy  measures  in  predicting  success  in  training  pro- 
grams for  pilots  and  flight  officers  was  also  explored . 

EXPERIMENTS  I AND  II 

In  the  first  experiment  were  drawn  items  from  tests  loading  on  two  spatial 
factors.  Items  from  the  Guilford-Zimmerman  Aptitude  Survey's*  Spatial  Orienta- 


♦Permission  was  obtained  from  Sheridan  Psychological  Services,  Inc., 
to  modify  the  Spatial  Orientation  and  Spatial  Visualization  subtests  of  the  Guil- 
ford-Zimmerman Aptitude  Survey. 


tion  (GZO)  subtest  and  the  U.S.  Navy's  Spatial  Apperception  Test  (SPA)  were 
used  to  represent  Guilford's  CFS-V  factor . Items  from  the  Guilford-Zimmerman 
Aptitude  Survey's  Spatial  Visualization  subtest  (GVZ)  represented  the  CFT 
factor . In  the  second  experiment  a block  rotation  test  whose  standard  forms 
also  load  on  the  CFT  factor  replaced  the  modified  GZO  items. 

PROCEDURE 

Test  Construction 


Spatial  Apperception  Test.  The  new  version  of  the  SPA  designed  for 
latency  scoring  (LSPA)  was  constructed  from  multiple-choice  items  from  Form  A 
and  Form  B of  the  SPA.  The  LSPA  requires  examinees  to  judge  whether  a land- 
scape shown  in  one  panel  of  a slide  is  the  view  that  would  be  seen  from  the  cock- 
pit of  an  airplane  shown  in  another  panel.  The  standard  SPA  presents  for  each 
of  30  landscapes  a set  of  five  airplanes  shown  at  different  orientations.  An  item 
from  each  test  is  given  in  Figure  2 . 

In  the  SPA  an  examinee  selects  the  best  choice  for  each  item  and  has  a time 
limit  of  10  minutes  for  the  entire  test.  In  the  LSPA  subjects  had  a maximum  of  15 
seconds  per  item  to  make  a "Yes"  or  "No"  response.  The  80  items  for  the  LSPA 
were  interleaved  in  order  from  the  two  forms  of  the  SPA.  Half  the  items  were 
randomly  selected  to  be  Yes  items,  and  the  other  half  were  No  items.  For 
Yes  items  the  landscape  was  matched  with  the  correct  airplane  from  the  SPA. 

For  No  items , the  landscape  was  paired  with  a randomly  selected  false  choice . 

Spatial  Visualization  Test.  The  LGZV  (see  Figure  2)  was  constructed  in 
a similar  manner  from  the  40-item  multiple  choice  GZV  (Form  B) . The  GVZ 
requires  examinees  to  mentally  rotate  an  alarm  r-lock  in  a specified  sequence  and 
then  judge  which  of  five  figures  matches  its  final  position.  In  the  GZV  the 
examinee  has  a time  limit  of  10  minutes  for  the  entire  40-item  test.  In  the  LGZV 
subjects  were  given  a maximum  of  20  seconds  per  item  to  make  a Yes  or  No 
response.  Items  in  the  LGZV  were  presented  in  the  same  order  as  they  occurred 
in  the  GZV . 

Spatial  Orientation  Test.  The  LGZO  was  constructed  from  Form  A of  the 
GZO  (see  Figure  2) . This  test  requires  examinees  to  determine  whether  a sym- 
bol accuz  ately  portrays  the  change  in  position  and  direction  that  has  occurred 
from  the  top  to  the  bottom  drawing  of  a motorboat  heading  toward  a coastline. 

The  time  limit  on  the  60-item  GZO  is  10  minutes.  In  the  LGZO,  subjects  were 
given  a maximum  of  15  seconds  to  respond  Yes  or  No  to  each  item . The  order 
of  presentation  was  the  same  in  the  two  tests,  and  selection  of  true  and  false  items 
in  the  LGZO  was  again  determined  randomly. 

Block  Rotation  T eat . For  the  block  rotaton  test  (LBRT) , three  rigid 
three-dimensional  block  structures  were  drawn,  similar  to  those  used  by  Shepard 
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and  Metzler  (14)  . Photographs  of  the  drawings  and  their  mirror  images  were 
taken.  Each  test  item  consisted  of  either  the  same  block  structure  presented  in 
two  different  orientations,  or  a structure  and  its  mirror  image.  Three  sets  of 
items  were  constructed,  one  for  each  block  figure.  In  each  set,  9 items  pre- 
sented a pair  of  identical  figures  at  varying  orientations,  and  9 items  presented 
a figure  and  its  rotated  mirror  image.  The  nine  match  figures  in  each  set  dif- 
fered by  0°,  +40°,  +80°,  +120°,  or  +160°.  The  total  number  of  items  was  54,  9 
match  and  9 no-match  items  for  each  of  three  basic  figures.  The  deadline  for  the 
LBRT  was  set  at  20  seconds  for  each  item . 

Instructions . Instructions  for  the  redesigned  tests  were  simple  modifi- 
cations of  the  instructions  for  the  paper -and-pencil  forms.  The  modified  instruc- 
tions showed  examples  of  the  items  and  explained  the  use  of  the  test  apparatus. 
They  also  included  a statement  to  be  as  accurate  as  possible,  and  informed  the 
subjects  of  the  maximum  time  allowed  for  each  item  . 

Test  Apparatus 

The  new  tests  were  given  on  an  automated  system  that  controlled  the  pre- 
sentation, timing,  and  scoring  of  two-choice  test  items  for  groups  of  six  or  fewer 
subjects.  The  system  comprised  six  testing  stations,  a Kodak  Ectagraphic  self- 
focusing  slide  projector  (Model  AF-2) , and  r.  centrally  located  viewing  screen. 

A UNIVAC  418  computer  operating  in  a real-time  mode  controlled  the  system . 

Test  stations  were  arranged  in  a row  parallel  to  the  screen  and  between  the 
screen  and  projector.  The  screen  was  4.2  meters  in  front  of  the  stations.  The 
row  of  stations  was  placed  so  that  the  viewing  angle  at  the  two  outboard  stations 
was  no  larger  than  30°.  The  projected  images  of  test  stimuli  subtended  visual 
angles  ranging  from  approximately  12°  for  the  LGZO  items  to  approximately  -0° 
for  the  LBRT  items.  Each  station  was  equipped  with  a hand-held  switch  box  on 
which  two  response  buttons  were  mounted . The  lefthand  button  was  labeled  "No" 
and  the  righthand  button  was  labeled  "Yes."  Subjects  were  instructed  to  hold  the 
box  with  both  hands  and  use  their  thumbs  to  activate  the  buttons. 

Method 


Subjects  were  given  the  new  tests  in  the  following  order:  LSPA,  LGZV, 

LGZO  (Experiment  I)  or  LBRT  (Experiment  II) . All  subjects  took  the  LSPA  and, 
depending  on  their  schedules  and  the  availability  of  equipment,  subsequently 
received  the  remaining  tests.  Three  to  five  days  after  taking'the  new  tests,  sub- 
jects were  given  the  Guilford -Zimmerman  Aptitude  Survey.  In  addition  to  the 
standard  versions  of  the  GZO  and  GZV,  this  survey  includes  subtests  of  Verbal 
Comprehension,  General  Reasoning,  Numerical  Operations,  Perceptual  Speed,  and 
Mechanical  Knowledge.  These  paper-and-pencil  forms  were  administered  under 
group  testing  conditions  with  approximately  25  examinees  per  group.  The  SPA 
had  been  given  prior  to  admission  to  the  program;  so,  those  scores  were  obtained 
from  the  subject's  records. 
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The  procedure  for  the  new  tests  started  with  the  subjects  reading  the 
instruction  booklet.  When  all  indicated  they  understood  the  instructions  the 
experimenter  initiated  the  test.  Items  were  projected  onto  the  screen  and  subjects 
responded  by  pushing  the  appropriate  button  on  the  switch  box . An  item 
remained  in  vie  w on  the  screen  until  either  all  subjects  responded  or  the  maxi- 
mum time  limit  was  reached . Approximately  1 .5  seconds  after  the  first  of  those 
events  occurred,  the  slide  projector  advanced  to  the  next  item.  After  a suc- 
cession of  six  such  items , a blank  slide  was  presented  for  the  maximum  time 
allowed.  This  served  as  a short  rest.  Just  prior  to  the  initiation  of  the  next 
sequence  of  six  items,  a "Ready"  signal  was  given. 

Scoring 

Latency  of  response  to  an  item  was  defined  as  the  interval  between  the 
onset  of  presentation  and  the  response  to  the  item.  If  a subject  did  not  answer  an 
item  by  the  end  of  the  time  limit , the  item  was  scored  as  wrong , with  a latency 
equal  to  the  time  limit . 

Subjects 

The  134  male  subjects  were  naval  Aviation  and  Flight  Officer  Candidates  at 
Pensacola  Naval  Air  Station.  Because  of  scheduling  difficulties  and  equipment 
failures,  complete  data  were  available  for  only  31  subjects  in  Experiment  I and  48 
in  Experiment  II.  Each  subject  had  been  selected  for  admission  into  his  respec- 
tive program  on  the  basis  of  a battery  of  screening  tests  that  included  the  SPA . 
Consequently,  typical  subjects  in  these  studies  had  greater  spatial  ability  than 
average  male  college  graduates . 

RESULTS 

Psychometric  Properties 

The  means,  standard  deviations,  and  reliabilities  of  accuracy  and  latency 
scores  are  given  in  Table  I.  These  data  cannot  be  taken  as  norms,  since  the 
absolute  values  of  scores,  especially  latency  scores,  depend  on  the  design  and 
calibration  of  the  test  apparatus,  However,  these  data  do  permit  several  useful 
observations . 

Split-half  reliabilities  of  latencies  were  generally  high  and  usually 
exceeded  the  reliabilities  of  the  correrponding  accuracy  scores  on  the  new  tests. 
Reliabilities  of  latencies  approximated  the  levels  of  reliability  typical  of  accuracy 
scores  on  the  standard  five-choice  tests. 

The  proportion  of  items  answered  correctly  was  greater  on  the  new  ver- 
sions of  the  tests , since  the  probability  of  guessing  correctly  was  higher , and 
subjects  at  least  attempted  each  problem  . The  standard  versions  of  the  GZO  and 
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Table  I 


Meant,  Standard  Deviations  and  Reliabilities  of  Accuracy  and  Latency  Scores 


Measure 

N 

Mean 

S.D. 

Reliability8 

L5PA  Number  (Proportion)  Correct 

127 

45.30  (.76) 

6.65  (.11) 

.75 

LSPA  Correct  Latency  (sec.) 

127 

6.02 

1.32 

.94 

LGZV  Number  (Proportion)  Correct 

106 

26.70  (.67) 

4.97  (.12) 

.69 

LGZV  Correct  Latency  (sec.) 

106 

10.43 

2.25 

.84 

LGZO  Number  (Proportion)  Correct 

32 

44.84  (.75) 

8.04  (.13) 

.93 

LGZO  Correct  Latency  (sec.) 

32 

7.05 

1.16 

.92 

LBRT  Number  (Proportion)  Correct 

60 

45.88  (.85) 

4.78  (.09) 

.65 

LBRT  Correct  Latency  (sec.) 

60 

6.39 

1.25 

.92 

SPA  Number  (Proportion)  Correct 

133 

19.74  (.66) 

5.72  (.19) 

,71b 

GZV  Number  (Proportion)  Correct 

100 

25.03  (.63) 

8.16  (.20) 

91c 

GZO  Number  (Proportion)  Correct 

100 

30.27  (.50) 

11.63  (.19) 

88d 

Reliabilities  computed  b/  split-half  technique  correcting  for  length  of  test  unless  otherwise  noted. 
bUncorrected  alternate-form  reliability  reported  by  Gannon  (5) . 
cSplit  half  reliability  reported  in  Guilford  and  Zimmerman  (8). 

^Reliability  estimated  by  administeiing  test  in  two  separately  timed,  equivalent  halves,  intercorrelating  the  part 
scores,  and  applying  the  Spearman-Brown  formula  (8). 


the  GZV  are  speeded  tests  so  that  items  occurring  later  in  these  tests  may  never 
be  attempted.  The  lower  reliability  for  the  new  accuracy  scores  may  be  due 
partly  to  the  binary-choice  format  of  these  tests  and  partly  to  the  restriction  in 
range  of  ability  sampled . Reliability  data  for  the  standard  GZV  and  GZO  for  this 
population  were  not  collected . 

The  LGZV  was  the  most  difficult  of  the  new  tests.  If  corrected  for  guessing, 
accuracy  scores  would  be  considerably  below  that  of  the  others.  The  mean  latency 
for  correct  responses  to  LGZV  items  was  the  highest,  and  the  time  limit  was 
exceeded  on  a greater  proportion  of  items  from  the  LGZV  (.061)  than  the  LSPA 
(.034)  . LGZO  (.017)  . or  LBRT  ( . 055)  . 

Intercorrelations 


Correlations  among  accuracy  and  latency  scores  are  shown  in  Table  II. 

The  pattern  of  correlation  suggests  that  accuracy  and  latency  scores  measure 
different  facets  of  spatial  ability.  First,  with  few  exceptions,  correlations  among 
accuracy  scores  on  all  the  spatial  tests  were  statistically  significant  and  had  a 
mean  of  7 = .45.  The  highest  correlations  between  accuracy  scores  occurred 
when  accuracy  on  the  standard  and  redesigned  forms  of  the  same  test  was  com- 
pared. The  correlations  between  the  GZV  and  LGZV  (7  = .74)  and  between  the 
GZO  and  LGZO  (r  = .72)  are  satisfactorily  close  to  alternate-form  reliability  when 
restriction  of  range  of  ability  in  the  sample  is  considered.  The  correlation  of 
accuracy  scores  on  the  SPA  and  LSPA  was  lower  (7  = .45)  but  still  highly  signi- 
ficant. This  lower  correlation  may  have  resulted  because  subjects  were  pre- 
viously screened  partly  on  the  basis  of  SPA  scores , or  because  SPA  scores  were 
obtained  from  two  different  forms  of  the  test  given  many  months  before  the  experi- 
ment. The  differ enc.  ietween  the  CFS-V  and  CFT  factors  is  not  present  in  the 
accuracy  data  as  correlations  among  different  tests  of  the  same  factor  are  no 
higher  than  correlations  among  tests  of  different  factors.  Generally,  the  accur- 
acy data  indicate  that  these  tests  are  measuring  a common  process  or  ability. 

A second  characteristic  of  the  data  in  Table  II  is  that  the  measures  of 
latency  are  highly  correlated  ( r = .55)  . Thus  the  latency  of  correctly  solving 
spatial  problems  was  a consistent  characteristic  of  a subject  across  all  four  of  the 
new  tests , 

Third,  correlations  between  latency  and  accuracy  scores  were  generally 
negative  and  of  low  magnitude,  having  a mean  of  7 = - .22.  To  the  extent  that  a 
reliable  relationship  existed  between  accuracy  and  latency  it  was  in  the  direction 
of  the  more  accurate  subjects  responding  faster.  The  largest  correlations  of  this 
type  involved  accuracy  scores  on  the  GZO  and  GZV , two  standard  tests  in  which 
response  latency  partly  determines  how  many  items  are  attempted  and  1 tus 
directly  influences  accuracy  scores. 
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Correlations  of  Spatial  Accuracy  and  Latency  Scores3 


P <4)1 


Factor  Analysis 


A matrix  of  correlations  among  scores  from  the  LSPA,  LGZV,  LBRT,  and  the 
Guilford-Zimmerman  Aptitude  Survey  was  analyzed  by  a principal -components 
procedure  with  varimax  rotation , Each  correlation  in  the  matrix  was  bar-ed  on  a 
sample  size  of  at  least  44.  The  rotated  factor  loadings  are  given  in  Table  III. 

Two  distinct  factors,  one  wifh  high  loadings  for  accuracy  the  other  tor 
latency  of  response  to  spatial  problems,  emerged.  A third  group  factor  was  also 
present  with  moderate  to  high  loadings  for  Numerical  Operations,  Verbal  Compre- 
hension, and  General  Reasoning.  In  addition  to  high  loadings  for  all  spatial 
accuracy  scores,  the  first  factor  showed  moderate  loadings  for  General  Reasoning 
and  Mechanical  Knowledge,  a typical  pattern  for  a spatial  ability  factor.  The 
only  nonspatial  test  with  a substantial  loading  on  the  spatial  latency  factor  was 
Perceptual  Speed,  a highly  speeded  test  requiring  detailed  pattern  matching. 

This  test  had  a moderate  negative  loading  , indicating  that  subjects  who  answered 
more  items  correctly  on  the  Perceptual  Speed  test  tended  to  have  lower  mean 
latencies . 

DISCUSSION 

Moan  latency  of  solving  spatial  problems  was  a reliable  measure  and 
correlated  consistently  across  several  tests  of  spatial  ability . However , accuracy 
and  latency  of  solving  spa  al  problems  defined  distinct  factors.  Referring  to  the 
model,  t . and  P-  for  a ^iven  task  did  not  correlate  as  highly  as  a set  of  t . ' s or 
a set  of  * p 's  obtained  for  different  tasks.  1 

The  new  accuracy  scores  correlated  highly  with  the  standard  accuracy 
scores;  so,  they  represent  what  has  been  traditionally  called  spatial  ability.  What 
do  the  latency  scores  represent?  For  example,  being  able  to  solve  a spatial 
problem  quickly  has  little  to  do  with  general  intelligence,  as  indicated  by  the  low 
loadings  of  Verbal  Comprehension,  General  Reasoning,  and  Numerical  Operations 
on  the  spatial  latency  factor.  Perhaps  spatial  mean  latency  scores  reflect  merely 
the  variability  in  the  latency  of  decision  and  output  processes  common  to  a variety 
of  tasks . 


EXPERIMENT  III 

The  third  experiment  was  designed  to  determine  whether  mean  latency 
scores  obtained  on  the  spatial  tests  have  a "spatial"  quality  or  whether  they 
reflect  variation  in  simple  decision  and  output  processes . Two  types  of  analyses 
were  carried  out.  First,  subjects  taking  the  spatial  tests  also  took  a test  measur- 
ing simple  decision  time.  This  task  was  designed  to  Include  the  decision  and 
output  processes  involved  in  the  spatial  tasks,  while  omitting  the  spatial  trans- 
formation necessary  to  make  a decision . Second , estimates  of  component-process 
latencies  were  obtained . 
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Table  III 

Rotated  Factor  Loading*  for  Spatial  Accuracy,  Spatial  Latency  and 
Guilford-Zimmerman  Aptitude  Survey  Score* 


Measure 

1 

Factor 

II 

III 

LSPA  Number  Correct 

.66 

-.06 

.10 

LSPA  Mean  Correct  Latency 

.02 

.70 

-.25 

LGZV  Number  Correct 

.82 

-.01 

.13 

LGZV  Mean  Correct  Latency 

-.26 

.84 

04 

LB1T  Number  Correct 

.77 

-.17 

03 

LBRT  Mean  Correct  Latency 

-.19 

.87 

.16 

GZAS  Verbal  Comprehension 

.40 

.16 

.46 

GZAS  General  Reatoning 

.57 

-.08 

.45 

GZAS  Numerical  Operation* 

-.08 

.09 

.85 

GZAS  Perceptual  Speed 

.30 

.47 

.40  | 

GZAS  Spatial  Orientation 

.76 

-.23 

.10  | 

GZAS  Spatial  Vitualization 

.78 

.33 

.04 

GZAS  Mechanical  Knowledge 

.51 

-.35 

-.07 

Proportion  of  Variance  Accounted  For 


.37 


.14 


.10 


Specifically,  slopes  and  intercepts  of  linear  response  latency  functions 
across  sets  of  problems  in  the  LGZV  and  LBRT  were  calculated  for  individual  sub- 
jects . For  the  LGZV,  a subject  was  assumed  to  carry  out  the  following  process- 
ing sequence  (3):  first,  code  the  stimulus;  second,  transform  the  coded  repre- 
sentation as  indicated  by  the  test  item;  third,  decide  whether  the  transformed 
representation  matches  the  answer;  fourth,  output  "Yes"  if  a match  occurs,  or 
"No"  if  a mismatch  occurs . In  this  model  only  the  transformation  process  is 
affected  by  the  number  of  mental  turns  the  item  requires . The  slope  of  the 
response  latency  function  across  classes  of  items  requiring  one,  two,  three, 
or  four  mental  turns  is  thus  an  estimate  of  the  amount  of  time  taken  for  each  addi- 
tional mental  transformation.  The  zero-intercept  of  that  function  is  an  estimate 
of  time  taken  for  all  other  processes  - coding,  decision,  and  output. 

For  the  LBRT  the  model  proposed  by  Just  and  Carpenter  (10)  for  "same" 
trials  was  adopted.  The  model  assumes  that  subjects  work  through  a sequence  of 
three  processes:  search,  transformation,  and  confirmation.  Angular  disparity 
between  two  figures  has  the  greatest  effect  on  the  transformation  process.  Thus, 
the  slope  of  the  response  latency  function  across  classes  of  items  differing  in 
orientation  by  0°,  40°,  80°,  120°,  or  160°  is  an  estimate  of  the  additional  time 
taken  for  mental  transformation  per  40°  increment  in  the  angular  disparity  of  the 
block  structures . The  zero-intercept  is  an  estimate  of  the  combined  latency  of 
those  processes  unaffected  by  angular  orientation.  This  Includes  some  portion 
of  search  and  confirmation  latency,  and  presumably  all  response  output  latency. 

Since  the  slopes  in  these  two  models  represent  increments  of  time  taken  for 
additional  spatial  transformation,  they  should  have  a distinctly  spatial  character. 
Intercepts  should  represent  combined  latency  for  coding , decision , and  output 
processes . A test-retest  procedure  was  employed  to  observe  the  reliability  of  the 
various  measures  and  any  effects  due  to  learning . 

PROCEDURE 

Test  Construction 


Spatial  Tests.  The  LBRT  was  the  same  test  as  that  used  in  Experiment  II. 
The  LGZV  was  a modified  (50-item)  version  of  the  test  used  in  the  first  two 
studies,  where  the  additional  items  were  simply  second  presentations  of  the  most 
reliable  original  items . 

Yes/No  Decision  Test.  A test  of  simple  Yes/No  decisions  (LYNT)  con- 
sisted of  60  slides  half  of  which  projected  the  word  YES  the  other  half  the  word 
NO  in  large  black  letters  on  the  screen.  Subjects  simply  pressed  the  button 
corresponding  to  the  stimulus  word.  Items  were  randomly  ordered  and  grouped 
in  blocks  of  six  as  in  the  other  tests . Instructions  for  the  LYNT  were  worded  in 
a way  similar  to  the  instructions  for  the  other  tests . 
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Test  Apparatus 


A portable  test  controller  was  constructed , and  in  a new  seating  configura- 
tion five  examinees  sat  in  two  staggered  rows  of  test,  stations . The  first  row 
was  2.6  meter b from  the  screen  and  the  second  row  was  3.8  meters  from  the 
screen.  This  configuration  resulted  in  a maximum  viewing  angle  of  13°  at  the 
two  outboard  stations.  The  visual  angle  of  test  items  ranged  from  approximately 
7°  for  the  LYNT  words  to  17°  for  the  LBRT  figure  pairs. 

Method 

Examinees  were  given  the  tests  in  the  following  order:  LGZV,  LYNT,  and 
LBRT . One  to  three  days  after  the  first  session , examinees  returned  for  the 
second  session  in  which  the  first  session's  procedure  was  repeated.  Assignment 
of  examinees  to  test  stations  on  the  second  day  was  balanced  under  the  constraint 
that  no  examinee  should  sit  in  the  same  place  on  both  days . The  procedure  for 
the  LBRT  and  LGZV  was  identical  to  that  in  the  first  two  studies.  For  the  LYNT, 
each  item  remained  in  view  for  5 seconds  before  advancing . 

Scoring 

For  each  subject  on  the  LGZV,  the  latency  of  each  response  was  paired 
with  the  number  of  mental  turns  required  by  the  item , and  the  regression  line 
relating  response  latency  to  number  of  turns  was  obtained.  The  slope  and  zero- 
intercept  of  this  least-squares  function  were  used  in  addition  to  overall  accuracy 
and  mean  latency.  For  the  LBRT,  latencies  of  correct  "same"  responses  were 
paired  with  the  angle  of  rotation  required  to  bring  the  two  block  figures  into  con- 
gruity.  Again,  slopes  and  zero-intercepts  of  the  least-squares  regression  lines 
were  used  as  additional  measures. 

Subjects 

Subjects  were  50  naval  aviation  and  flight  officer  candidates.  Due  to 
scheduling  difficulties,  only  41  examinees  were  available  for  the  retest,  so  all 
analyses  were  restricted  to  those  with  complete  data. 

RESULTS 

Group  Data  on  Spatial  Tasks 

LGZV . Performance  scores  on  the  four  classes  of  LGZV  items  were 
averaged  across  subjects,  yielding  the  results  shown  in  Figure  3.  For  both  days, 
these  group  data  show  that  response  latency  and  error  rates  increased  monotoni- 
cally  as  a function  of  the  number  of  mental  turns  required . An  analysis  of  var- 
iance of  the  latency  data  indicated  that  there  was  a significant  (F  (3,120)  = 509.22, 
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2 3 

NUMBER  OF  TURNS 


Figure  3.  Latency  and  accuracy  of  responses  to  LCIZV 
items  grouped  by  the  number  of  mental  turns 
required.  Numbers  in  parentheses  represent 
the  proportion  of  wrong  answers  for  each 
class  of  item.  Brackets  arc  ^standard  error 
of  mean  latency. 
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p < .001)  increase  in  latency  attributable  to  turns,  a significant  (F  (1 , 40)  = 
178.60,  p < .001)  decrease  in  latency  from  Day  1 to  Day  2,  but  no  interaction 
of  days  and  turns  (F (3,120)  = 0.66,  p > .10) . The  linear  trend  across  turns 
was  highly  significant  (F(l,40)  = 662.15,  p < .001)  and  accounted  for  99.3 
percent  of  the  variance  due  to  turns.  Consequently,  there  is  ample  evidence 
in  the  group  data  to  support  the  use  of  slopes  and  intercepts  to  characterize 
subjects'  latency  of  response 

LBRT . For  each  subject  on  each  day  the  mean  response  latency  fox  cor- 
rect "same"  trials  was  computed  for  LBRT  items  requiring  40°,  80°,  120°,  or  160° 
ictation.  Scores  for  0°  rotation  were  excluded  since  they  were  based  on  only 
half  the  number  of  observations  as  other  categories  and  proved  to  be  unreliable. 
Scores  from  the  other  four  categories  were  averaged  across  subjects  with  the 
results  shown  in  Figure  4.  The  data  show  a monotonlc  trend  such  that  items 
requiring  greater  rotation  take  longer  to  answer  and  result  in  more  errors.  An 
analysis  of  variance  performed  on  the  latency  data  showed  that  there  was  a sig- 
nificant (F  (3,120)  = 107.99,  p < .001)  increase  in  latency  as  more  rotation  was 
required,  and  a significant  (F(l,40)  = 122.23,  p < .001)  decrease  in  latency 
from  Day  1 to  Day  2,  but  only  a small,  nonsignificant  interaction  (F  (3,120)  = 

1.88,  p > .10) . The  linear  trend  across  rotation  was  highly  significant 
(F (1 , 40)  = 180.19,  p < .001)  and  accounted  for  93.6  percent  of  the  variance  due 
to  rotation.  These  group  data  are  in  agreement  with  the  original  Shepard  and 
Metzler  (14)  findings  and  support  the  use  of  slope  and  Intercept  measures  to 
characterize  the  response  latency  of  individual  subjects . 

Psychometric  Properties  of  Individual  Measures 

The  means,  standard  deviation  , and  reliabilities  of  individual  measures  on 
Day  1 and  Day  2 are  given  in  Table  IV.  Subjects  were  more  accurate  and  faster 
on  Day  2,  the  effects  being  larger  and  statistically  significant  for  the  two  spatial 
tests.  As  indicated  in  the  group  data  for  both  the  LGZV  and  LBRT,  the  time  taken 
for  coding,  decision,  and  output  processes  as  measured  by  intercepts  decreased 
dramatically  from  Day  1 to  Day  2.  For  the  LBRT,  there  was  also  a significant 
decrease  in  slope  from  Day  1 to  Day  2,  meaning  that  a typical  subject  took  less 
time  for  additional  40°  increments  in  rotation  on  the  second  administration  of  the 
test.* 


The  reliabilities  of  spatial  mean  latency  scores  were  lower  than  previously 
indicated.  The  discrepancy  between  split-half  and  test-retest  reliabilities 
apparently  reflects  an  instability  of  latency  scores  over  time  at  least  for  subjects 


♦The  apparent  discrepancy  between  this  finding  and  the  lack  of  a Days 
x Rotation  interaction  in  the  group  data  is  explained  by  the  use  of  subjects'  class 
means  in  the  group  analysis  rather  than  individual  data  points  which  were  used 
to  compute  slopes  and  intercepts  for  each  subject. 
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RESPONSE  LATENCY  (sec.) 


6 


• DAY  1 
o DAY  2 

(PROBABILITY  OF  ERROR) 


(.12) 


(.02) 


(19). 


(.10) 


(04) 


(.18) 


40°  80°  120°  16C 

ANGLE  OF  REQUIRED  ROTATION 


l iguro  4 Latency  of  correct  •‘same"  responses  and 

probability  of  error  for  LBRT  items  grouped  by 
the  angle  of  rotation  required  to  bring  the  two 
figures  into  congruity.  Numbers  in  parentheses 
represent  the  probability  of  error  on  each  class 
of  item.  Brackets  aie  + 1 standard  error  of 
mean  latency. 


Table  IV 


Meant,  Standard  Deviation*,  Reliabilities.,  and  t-valuet  for 
Measures  on  Day  1 and  Day  2 


Measure 

- Day  1 

X i.d. 

- Day  2 

X s.d. 

l test-retest 

5 Day  1-Day  5 

LGZV  Number  Correct* 

43.34 

5.23 

44.95 

6.31 

70<* 

2.26* 

LGZV  Correct  Latency  (tec.) 

7,80 

1.48 

4.89 

1.06 

.727*  • 

12.01** 

LGZV  Least  Squares  Slope 

2.34 

.49 

2.39 

.36 

.628** 

.52 

LGZV  Least  Squares  Zero  Intercept 

2.25 

1.41 

.11 

.91 

.562** 

11.67** 

LYNT  Number  Correct 

52.32 

1.46 

52.73 

1.30 

.099 

1.43 

LYNT  Correct  Latency  Isec.) 

.35 

.07 

.34 

.05 

.629** 

.86 

LBRT  Number  Correct 

44.78 

4.44 

46.02 

4.60 

.669** 

2.16* 

LBRT  Correct  Latency  (sec.) 

4.30 

.88 

3.05 

.59 

.705** 

12.79** 

LBRT  Least  Squares  Slope 

.64 

.42 

.50 

23 

.414** 

2.27* 

LBRT  Least  Squares  Zero  Intercept 

2.40 

1.12 

1.55 

67 

.598*  * 

6.09** 

a Th*  wortt  case  of  exceeding  the  deadline  wat  for  the  LGZV  on  Day  1 for  which  .018  of  all  responses  were 
over  the  deadline. 

* P < 05 

**  p <.01 
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with  little  practice  (1)  . The  reliabilities  of  mean  latency  scores  were  still 
higher  than  reliabilities  of  corresponding  accuracy  scores  for  spatial  tests. 
Slopes,  intercepts,  and  mean  latency  on  the  LYNT  had  somewhat  lower  relia- 
bilities. The  reliability  of  accuracy  scores  on  the  LYNT  was  extremely  low  due 
to  the  fact  that  everyone  had  close  to  perfect  accuracy  on  the  task . This  score 
was  excluded  from  further  analyses. 

Intercorrelations 


To  improve  the  reliability  of  measures,  each  subject  was  assigned  the  mean 
of  his  Day  1 and  Day  2 scores.  These  mean  scores  were  correlated,  and  the 
results  are  given  in  Table  V.  The  pattern  found  in  the  first  two  studies  repeated. 
Spatial  accuracy  scores  correlated  significantly  (r  - .53,  p < .01) , and  spatial 
mean  latency  scores  correlated  significantly  (r  = .58,  p < .01) . In  this  study 
accuracy  and  mean  latency  were  virtually  independent,  the  mean  of  accuracy- 
latency  correlations  being  r = .01. 

It  might  be  hypothesized  that  the  low  accuracy-latency  correlations  are 
due  to  the  mixture  of  items  used.  As  shown  in  Figures  3 and  4,  different  classes 
of  items  on  each  spatial  test  resulted  in  different  levels  of  performance.  It  is 
possible  that  high  accuracy- latency  correlations  might  exist  within  a class  of 
items  but  are  disguised  by  the  averaging  process  necessary  to  obtain  mean  latency 
and  overall  accuracy.  To  check  this  hypothesis,  accuracy  and  latency  scores  for 
individual  subjects  on  each  class  of  items  in  the  LGZV  and  LBRT  were  obtained. 
The  within-class  correlations  of  accuracy  and  latency  were  no  higher  than  the 
correlation  of  mean  latency  and  overall  accuracy.  In  fact,  the  within-class  cor- 
relations of  accuracy  and  latency  were  distinctly  lower  than  correlations  of 
accuracy  scores  across  classes  and  correlations  of  latency  scores  across  classes. 

One  question  of  interest  is  whether  mean  latencies  from  the  LGZV  and 
LBRT  merely  reflect  decision  and  output  processes.  To  resolve  this  question, 
the  partial  correlation  between  mean  latency  scores  on  the  LGZV  and  LBRT  was 
computed,  holding  the  LYNT  latency  score  constant.  This  partial  correlation  is 
r = .54  (p  < .01)  for  these  data,  indicating  that  the  relationship  between  mean 
spatial  latencies  cannot  be  due  entirely  to  the  latency  of  decision  and  output  pro- 
cesses . 

Further  evidence  of  the  "spatial"  nature  of  the  mean  latency  scores  is 
dorived  from  positive  correlations  between  mean  latencies  and  slopes.  The  LGZV 
slope  correlated  highly  with  both  its  own  mean  latency  ( r = .70,  p < .01)  and 
that  of  the  LBRT  ( r = .42,  p < .01)  . The  LBRT  slope  also  correlated  positively 
but  at  lower  levels  with  its  mean  latency  (r  = .30,  .10>  p > .05)  and  the  LGZV 
mean  latency  (r  = .16,  p > .10). 

The  two  slope  measures  correlated  positively  (r  = .33,  p < .05)  as  did  the 
two  intercepts  (r  = .35,  < .05).  The  strong  negative  correlations  between  slopes 
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and  intercepts  of  the  same  task  shown  in  Table  V may  be  inflated  because  the  esti  - 
mators  for  these  parameters  are  not  independent.  A lower  bound  on  the  true 
magnitude  of  these  correlations  was  obtained,  using  the  correlations  between 
slopes  and  intercepts  estimated  in  different  sessions.  For  the  LGZV  the  average 
slope-intercept  correlation  across  sessions  was  r = .29;  for  the  LBRT  it  was 
r = - .27.  In  both  cases,  even  these  estimates  suggest  that  the  two  component 
processes  are  negatively  correlated.  Finally,  as  expected,  response  latency  to 
Yes/No  items  tended  to  correlate  more  strongly  with  intercepts  which  include 
decision  and  output  latency  than  with  slopes  which  measure  spatial  transformation 
latency . 

Relationship  of  "Spatial  Ability"  to  Component  Processes 

It  is  possible  to  ask  which  of  the  latency  measures  are  more  strongly 
related  to  accuracy  scores  (i.e.,  traditional  "spatial  ability")  . The  result  is 
somewhat  surprising . The  mean  of  the  four  accuracy-slope  correlations  was  low 
Cr  = .19)  and  in  the  unexpected  direction,  Subjects  with  slower  rate^:  of  mental 
rotation  tended  to  have  higher  accuracy  scores.  On  the  other  hand,  all  accuracy- 
intercept  correlations  were  at  least  marginally  significant  with  a mean  of  r = - .30. 
Subjects  who  could  rapidly  code,  decide,  and  output  were  more  accurate.  In 
every  case  the  correlation  between  accuracy  and  intercepts  was  stronger  than 
the  correlation  between  accuracy  and  mean  latency . Thus  "spatial  ability"  as 
commonly  defined  by  accuracy  scores  appears  to  have  more  in  common  with  cod- 
ing, decision,  and  output  latency  than  it  does  with  spatial  transformation  latency 
or  mean  latency. 


ANALYSIS  OF  PREDICTIVE  VALIDITY 


PROCEDURE 

The  utility  of  the  various  measures  of  spatial  ability  in  predicting  reai- 
world  criteria  was  explored  as  follows.  Training  records  of  the  subjects  parti- 
cipating in  the  first  two  experiments  and  other  preliminary  studies  were 
examined,  and  training  criteria  were  correlated  with  the  spatial  test  scores.  The 
criteria  were  academic  and  performance  scores  for  Navy  pilot  and  flight  officer 
candidates  obtained  at  different,  stages  of  training  up  to  18  months  after  partici- 
pating in  these  experiments.  For  pilots  the  criteria  u ed  were:  (i)  the  overall 
grade  from  Aviation  Officer  Candidate  School  (AOCS)  that  included  performance 
in  mathematics,  physfcs,  and  engineering  courses;  (ii)  a one/zero  criterion  of 
pass  (1)  or  fail  (0)  during  flight  training;  and  (iii)  for  those  students  who  passed, 
the  flight  performance  grade  obtained  in  13  instructional  flights  prior  to  soloing 
in  a light  aircraft.  For  flight  officers,  AOCS  grades,  two  grades  from  basic 
school  reflecting  academic  performance  (engineering,  navigation,  technical  train- 
ing) and  performance  on  training  flights,  and  again  a one/ zero,  pass/fail  criter- 
ion in  basic  school  were  used . 
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RESULTS 


Correlations  between  criteria  and  measures  from  the  LGZV  are  given  in 
Table  VI.  Although  the  relationships  tend  to  be  weak  because  of  the  imprecision 
of  both  tests  and  criteria , certain  regularities  are  apparent . The  accuracy  score 
from  the  LGZV  correlated  positively  with  the  criteria.  This  is  consistent  with  the 
findings  that  aviation  training  programs  require  a great  degree  of  "spatial  ability" 
as  traditionally  defined  (7)  . It  is  also  consistent  with  the  contention  that  the 
nature  of  conventional  spatial  ability  scores  is  preserved  in  the  accuracy  scores 
obtained  by  the  present  procedure . Intercepts  and  to  a lesser  extent  mean 
latencies  showed  a negative  relationship  to  the  criteria,  subjects  with  longer 
latencies  tending  to  perform  worse.  In  particular,  this  was  true  for  the  inter- 
cept's correlations  with  AOCS  grade  and  the  pass/fail  criterion  for  flight  officers . 
The  rate  of  spatial  transformation  as  measured  by  slopes  had  only  a very  weak 
positive  relationship  to  the  criteria.  Thus,  for  these  criteria,  accuracy  scores 
and  an  estimate  of  the  time  to  code,  decide,  and  output  were  more  successful  pre- 
dictors than  were  mean  latency  or  rate  of  mental  rotation.  This  general  pattern 
held  when  the  other  spatial  tests  were  used  as  predictors. 

GENERAL  DISCUSSION 

PROCESSES  REFLECTED  IN  MEASURES  OF  SPATIAL  ABILITY 
Relationships  Between  Latency  and  Accuracy 

It  is  useful  to  distinguish  among  three  kinds  of  latency  and  accuracy 
relationships.  If  stimulus  conditions  are  varied  and  both  response  latency  and 
accuracy  are  measured,  latency  and  accuracy  typically  show  a strong  negative 
correlation  across  stimuli . This  is  true  of  the  present  data  where,  for  example, 
items  requiring  more  rotation  in  the  LGZV  or  a larger  single  rotation  in  the  LBRT 
take  longer  to  answer  correctly  and  produce  more  errors.  In  cases  such  as  these, 
accuracy  and  latency  are  two  dependent  measures  of  the  same  effect.  A second 
kind  of  latency-accuracy  relationship  involves  manipulating  instructions,  payoffs, 
or  deadlines  within  stimulus  conditions.  In  this  case  a speed -accuracy  trade- 
off may  be  produced.  Fast  responses  are  the  result  of  guessing  or  incomplete 
processing  and  are  therefore  mort  likely  to  be  errors.  The  extent  to  which  this 
occurred  in  the  present  studies  is  not  certain . While  wrong  answers  typically 
took  longer  than  correct  answers  of  the  same  type  (3) , a speed-accuracy  tradeoff 
cannot  be  conclusively  rejected  without  more  complete  data. 

The  latency -accuracy  relationship  of  primary  interest  in  the  present 
studies  differs  from  each  of  the  previous  types . In  the  present  studies  subjects 
were  allowed  to  respond  within  a fairly  comfortable  deadline  to  a variety  of  items . 
Characteristic  accuracy  and  latency  for  individual  subjects  were  measured  and 
correlated.  The  two  measures  provided  reliable  but  distinct  information  about 
the  subjects . 


Table  VI 


Correlations  of  LGZV  Measures  with  Pilot 
and  Flight  Officer  Training  Criteria 


Measure 

Pilots* 

AOCS 

Grade 

Flight 

Training 

Pass/Fail 

Pre-Solo 

Flight  Grade 

LGZV  Number  Correct 

.24* 

.13 

.25 

LGZV  Mean  Correct  Latency 

.15 

- .01 

• .01 

LGZV  Least  Squares  Slope 

.03 

.14 

.09 

LGZV  Least  Squares  Zero  Intercept 

.01 

• .19 

-.13 

Measure 

Flight  Off  ice  rtb 

AOCS 

Grade 

Basic  School 
Past/Fail 

Basic  School 
Academic  Grade 

Basic  School 
Flight  Grade 

LGZV  Number  Correct 

.56*  * 

.16 

.42** 

.16 

LGZV  Mean  Correct  Latency 

.19 

• .29* 

• .03 

• .16 

LGZV  Least  Squares  Slope 

.20 

.13 

.06 

- .02 

LGZV  Least  Squares  Zero  Intercept 

- .34** 

• .32** 

■ .04 

• .11 

* P < .05 
” P < 01 

* The  number  of  pilots  starting  was  75.  Of  these,  61  passed  flight  training  and  received  grades. 
b The  number  of  flight  officers  starting  was  76.  Of  these,  39  passed  basic  school  and  received  grades. 
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WHAT  DO  MEASURES  OF  SPATIAL  ABILITY  MEASURE? 


Accuracy  scores  on  the  redesigned  tests  (i.e.,  estimates  of  p ) are 

closely  related  to  traditional  measures  of  "spatial  ability."  This  is  borne  out  by 
high  correlations  between  accuracy  on  the  standard  and  redesigned  forms  of 
tests , and  by  the  validity  of  the  new  accuracy  scores  in  predicting  aviation 
training  criteria.  In  terms  of  the  model,  the  process  (or  processes)  influencing 
traditional  accuracy  scores  also  contributes  a large  part  of  the  variance  to  the  new 
accuracy  scores.  Highly  accurate  performance  on  this  process  (or  processes) 
has  come  to  be  called  high  "spatial  ability." 

As  the  results  show,  traditional  "spatial  ability"  and  the  mean  latency  of 
response  to  the  same  test  items  loaded  on  different  factors . More  specifically, 
spatial  ability  as  conventionally  defined  was  only  weakly  related  to  spatial  trans- 
formation latency.  On  the  other  hand,  "spatial  ability"  correlated  significantly 
with  coding , decision , and  output  latency . One  explanation  of  these  results  is 
that  the  accuracy  score  reflects  variability  mainly  in  the  accuracy  of  coding  or 
searching  (10)  pictorial  stimuli,  but  the  mean  latency  score  reflects  substantial 
variability  in  the  speed  of  mentally  transforming  the  code . 

The  hypothesis  that  "spatial  ability"  (i.e. , accuracy)  reflects  variation  in 
coding  while  mean  latency  reflects  variation  in  rate  of  transformation  is  con- 
sistent with  the  following  findings.  1)  Accuracy  and  mean  latency  of  response 
to  spatial  problems  loaded  on  different  factors.  This  result  is  expected  if  (a)  the 
accuracy  composite  is  influenced  by  variability  in  one  process  (e.g. , coding) , 

(b)  the  latency  composite  is  influenced  by  variability  in  another  process 
(e.g. , transformation) , and  (c)  the  two  processes  are  negatively  correlated  in 
the  population.  The  latter  condition  is  true  and  logically  must  be  true  in  order 
to  account  for  the  significant  negative  correlation  of  intercepts  with  accuracy 
when  mean  latency  and  accuracy  are  virtually  independent.  2)  Mean  latency 
correlated  positively  with  slopes  that  measure  rate  of  mental  transformation  . This 
is  direct  evidence  of  the  nature  of  the  f.  composite  for  spatial  tasks.  3)  Spatial 
accuracy  scores  correlated  negatively  with  intercepts.  This  result  follows  if  the 
P-  composites  are  influenced  mainly  by  a process  also  measured  by  intercepts. 
The  coding  process  is  the  most  likely  candidate.  If  accuracy  wore  instead  a mea- 
sure of  decision  and  output  processes , then  accuracy  should  have  correlated 
higher  with  the  LYNT  latency.  4)  Practice  both  improved  accuracy  and  lowered 
intercepts . If  practice  or  familiarity  with  stimuli  improves  a single  process  such 
as  coding,  then  this  result  follows.  5)  Accuracy  and  intercepts  were  the  best 
predictors  of  aviation  training  criteria.  This  suggests  that  the  aspect  of  spatial 
accuracy  scores  important  for  predictive  purposes  is  also  measured  by  intercepts. 
Of  the  processes  assumed  to  be  measured  by  intercepts,  coding  has  the  most 
"spatial"  character . 

An  alternative  explanation  of  these  data  would  begin  with  the  premise  that 
there  need  be  no  correlation  between  accuracy  and  latency  measures  of  a given 
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process  or  task.  In  terms  of  the  model  the  assumption  would  be  that  pw  equals 

zero  so  that  K , the  correlation  of  composite  accuracy  and  latency,  must  equal 
zero.  Latency  would  be  construed  as  measuring  a different  "trait"  (e.g. , cau- 
tion or  motivation)  totally  divorced  from  the  content  of  a test  and  perhaps  only  of 
minor  interest.  While  this  explanation  has  no  difficulty  with  the  near -zero  cor- 
relation between  accuracy  and  mean  latency,  it  cannot  easily  explain  several  other 
aspects  of  these  data.  First,  under  the  simplest  such  theory  there  is  no  reason 
why  intercepts  should  correlate  more  strongly  with  accuracy  than  mean  latency 
does.  Second,  as  the  results  of  Experiment  III  show,  practice  reduced  errors 
and  lowered  the  latencies  of  some  components.  It  seems  likely  that  different  sub- 
jects would  come  to  a test  with  the  equivalent  of  different  amounts  of  practice  and 
that  this  should  result  in  significant  negative  values  of  pw.  Third,  if  latency  is 

a measure  of  a general  trait,  then  correlations  between  Yes/No  latency  and  spatial 
latencies  should  be  as  high  as  the  correlation  between  two  spatial  latencies , which 
is  not  the  case.  Rather  than  measurement  of  different  traits  leading  low 
accuracy-latency  correlations,  the  view  espoused  here  is  that  differential  contri- 
bution of  processes  leads  to  low  correlation  of  composites . 

The  available  data  suggest  that  pB  and  pw  have  negative  values  for  spatial 
rotation  tasks . For  />B  this  claim  is  based  on  the  negative  correlations  observed 
between  slopes  and  intercepts.  Evidence  for  negative  pw  is  less  direct,  but  is 
suggested  by  the  significant  negative  correlation  of  accuracy  and  intercepts. 

While  these  claims  are  consistent  with  all  the  findings,  the  model  01  latency  and 
accuracy  proposed  here  has  not  been  rigorously  tested.  In  further  work  the 
model's  assumptions  and  the  effects  of  violating  those  assumptions  must  be 
assessed.  To  do  that  will  require  enough  data  to  reliably  estimate  component- 
process  accuracy  as  well  as  latency.  Minimally,  any  model  of  accuracy  and 
latency  scores  for  spatial  tasks  must  deal  with  the  fact  that  accuracy  and  mean 
latency  may  be  virtually  independent  even  when  accuracy  and  the  latency  of  a 
component  process  correlate  significantly . 

IMPLICATIONS  FOR  ABILITY  TESTING 

Since  mean  latency  and  accuracy  measure  distinct  factors  of  spatial  ability, 
the  practice  of  using  standard  speeded  tests  is  called  into  question.  If  much  of 
the  accuracy  variance  arises  from  one  process , and  much  of  the  latency  variance 
from  another,  then  a speeded  test  makes  sense  only  if  the  two  processes  correlate 
positively  (pn>  0)  and  if  accuracy  and  latency  correlate  negatively  within  pro- 
cesses < 0).  Of  course,  this  need  not  be  the  case. 

On  the  other  hand , the  investigation  of  component-process  latencies  is  a 
promising  direction  for  future  research  on  cognitive  abilities.  Although  com- 
ponent-process latencies  were  somewhat  less  reliable  than  mean  latencies  and 
overall  accuracy , their  reliability  could  be  improved  using  longer  testing 
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sessions  (see  1,  15) , Latency  measures  of  component  processes  almost  cer- 
tainly will  be  more  reliable  than  corresponding  accuracy  measures.  In  the  pre- 
sent studies , component-process  latencies  correlated  consistently  across  differ- 
ent tests  and  to  some  extent  were  predictive  of  aviation  training  criteria . 

Perhaps  more  important  than  psychometric  considerations  is  the  possibility 
that  latency  measures  of  component  processes  may  lead  to  a greater  understand- 
ing of  spatial  ability  and  other  cognitive  abilities . As  suggested  by  the  present 
results,  characterizing  subjects  by  reliable,  factorially  distinct,  but  rather  gross 
measures  (e.g. , accuracy  and  mean  latency)  may  not  be  optimal  for  some  pur- 
poses. Such  measures  when  viewed  as  composite  scores  may  actually  disguise 
important  relationships  between  distinct  component  processes  of  a task  and 
between  two  measures  of  the  same  process.  Alternatively,  use  of  component- 
process  measures  may  provide  more  precise  estimates  of  an  ability  and  may  aid 
in  clarifying  conceptual  relationships  among  tasks . 
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